Modeling of Fundamental Frequency Contour of Thai Expressive Speech using Fujisaki’s Model and Structural Model
نویسنده
چکیده
Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplished. To achieve the generation of expressive speech, we need to model the fundamental frequency (F0) contours accurately to preserve the speech prosody to preserve the quality of speech prosody. Approach: This study presents a comparison of two successful F0 models. One approach is based on the Fujisaki’s model which has been applied for many tonal and toneless languages. Another one is based on the structural model which has been conducted primarily for Mandarin Chinese. It is based on the assumption that the behavioral characteristics of vocal-fold elongation in vibration could be approximated by those of a simple forced vibrating system. Therefore this approach has been applied to model Thai expressive speech with best-fit function. Our speech database consists of male and female speech and each one contains 4 different speech styles including angry style, sad style, enjoyable style and reading style. Five sentences are used for each speech style and each sentence includes 100 samples. The speech sample in each group is analyzed for an F0 contour, subsequently a number of Fujisaki’s and structural modeling parameters are extracted for each contour. Thereafter, the parameters are used to synthesis the F0 contour and then the synthesized contour is compared with that of natural speech by calculating RMS error. Results: From the experimental analysis, it has been observed that RMS error of each speech style is different from the others for both models. It also reveals that the RMS error of the Fujisaki’s model is higher than that of the structural model for all speech styles. In other words, the structural model gives the better fit for modeling of the F0 contour of the expressive speech than that of the Fujisaki’s model. Conclusion: From the finding, it is a definite evidence that the structural model is more appropriate than that of the Fujisaki’s model for modeling four different speech styles including angry style, sad style and enjoyable style and reading style.
منابع مشابه
Thai Expressive Speech Processing Technology: A Review
Problem statement: The studies on Thai expressive speech or emotional speech have been conducted for years. Most of them are expected to analysis the characteristics of Thai expressive speech. However, the conclusive reviews on these studies have not been conducted for further study on the speech technology or application of Thai expressive speech. Approach: The review of research on Thai expre...
متن کاملAnalytical Study on Fundamental Frequency Contours of Thai Expressive Speech Using Fujisaki’s Model
Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplishe...
متن کاملAnalytical Study of Fujisaki’s Model of Fundamental Frequency Contour for Thai Tones
Problem statement: Tone of a tonal language is an important feature of a prosodic syllable to identify the meanings of that syllable or that part of word. Ii is very crucial to model the feature related to tone of speech to achieve the most naturalness in speech communication. Approach: The study presents an approach to analyze the model parameters of Thai tones for two genders. The successive ...
متن کاملFujisaki’s Model of Fundamental Frequency Contours for Thai Dialects
Problem statement: In general, there are a number of rural dialects in Thai. However, four dialects are mainly spoken by Thai people residing in four core region including central, north, northeast and south regions. Recognizing and synthesizing Thai speech with different dialects are consequently difficult. Approach: Prosody is an important factor that must be taken into account, since the pro...
متن کاملStructural Modeling of Fundamental Frequency Contour for Thai Expressive Speech
Problem statement: Appropriate modeling of fundamental Frequency (F0) contour for speech is a key factor to preserve the quality of speech prosody. One successful approach has been conducted for tonal language of Mandarin Chinese. It is based on the assumption that the behavioral characteristics of vocal-fold elongation in vibration could be approximated by those of a simple forced vibrating sy...
متن کامل